fix: remove overly broad acceptance heading pattern#1653
Conversation
Addresses 6 root causes identified in PR #228 post-mortem where the coding agent claimed 42/42 tasks complete when multiple acceptance criteria were unmet: Fix 1 - Require verification PASS before stopping: The stop decision now requires the verifier to return PASS. If verification fails, the agent is re-run to fix gaps (up to 2 attempts). Previously, verification was attempted once and ignored on failure. Fix 2 - Raise confidence thresholds in analyzeTaskCompletion: Keyword match threshold raised from 0.35 to 0.50 for HIGH confidence. Now requires 2+ matching words (not just percentage) to avoid single-word false positives. fileMatch tightened to require 2+ keywords or explicit file references. commitMatch requires 2+ substantive words. Fix 3 - Gate cascade logic for acceptance criteria: cascadeParentCheckboxes now detects acceptance criteria section headings and disables cascading within them. Each acceptance criterion must be independently checked — a checked parent no longer auto-checks children in acceptance sections. Fix 5 - Different verifier context: Verification steps now switch to the alternate agent (codex→claude or claude→codex) to avoid the structural problem where the same model that produced the work also verifies it. Configurable via verifier_agent. Fix 6 - Mechanical scope enforcement: New extractScopePatterns/validateScopeCompliance functions parse file patterns from the scope section and validate the PR diff against them. Scope violations block the tasks-complete stop decision. The verifier prompt now includes a mandatory Scope Check section. Fix 7 - Separate task/acceptance criteria tracking: Tasks and acceptance criteria are now counted independently. The stop decision requires BOTH allTasksDone AND allCriteriaMet. Auto-reconciliation only operates on task checkboxes, never acceptance criteria. Also fixes pre-existing duplicate fixAttemptMax declaration in Counter_Risk. https://claude.ai/code/session_01VtzHmRoYTL2kcxaacDgSqQ
…ing pattern Address two bugs identified in sync PR #244 code review: 1. Verification retry logic never incremented attempt_count, causing infinite retry loops on verification failures. Now both 'verify-acceptance' and 'fix-verification-gaps' reasons increment the counter so the exhausted-retries path is reachable. 2. The /acceptance/i heading pattern was overly broad, matching any heading containing "acceptance" (e.g. "User Acceptance Testing"). Changed to /^acceptance$/i for exact match only. Also updates 3 test expectations to match the stricter task-completion thresholds already in the codebase. https://claude.ai/code/session_01VtzHmRoYTL2kcxaacDgSqQ
The bare /^acceptance$/i pattern matched any heading containing only "Acceptance", which could false-positive on headings like "User Acceptance Testing" or "Acceptance Rate Analysis", incorrectly disabling cascade for unrelated sections. The remaining patterns (/acceptance\s*criteria/i, /definition\s*of\s*done/i, /done\s*criteria/i) are specific enough to correctly identify acceptance criteria sections. Addresses Copilot review comment on sync PR #244. https://claude.ai/code/session_01VtzHmRoYTL2kcxaacDgSqQ
Automated Status SummaryHead SHA: 9af9769
Coverage Overview
Coverage Trend
Top Coverage Hotspots (lowest coverage)
Updated automatically; will refresh on subsequent CI/Docker completions. Keepalive checklistScopeNo scope information available Tasks
Acceptance criteria
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a373a6efde
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } else if (reason === 'verify-acceptance' || reason === 'fix-verification-gaps') { | ||
| const previousAttemptCount = toNumber(verification?.attempt_count, 0); | ||
| verification = { | ||
| status: runResult === 'success' ? 'done' : 'failed', |
There was a problem hiding this comment.
Do not mark fix-verification runs as verified
This change now treats reason === 'fix-verification-gaps' as a verification attempt and sets verification.status to done on any successful run, but that run is not guaranteed to be an acceptance verification pass (the routing logic still maps fix-* reasons to fix_ci mode in keepalive_prompt_routing.js). Because evaluateKeepaliveLoop stops when verificationDone is true, a successful fixer run can prematurely end the loop without a fresh verifier pass, leaving acceptance gaps unverified.
Useful? React with 👍 / 👎.
| const ACCEPTANCE_HEADING_PATTERNS = [ | ||
| /acceptance\s*criteria/i, | ||
| /acceptance/i, | ||
| /definition\s*of\s*done/i, | ||
| /done\s*criteria/i, |
There was a problem hiding this comment.
Match plain Acceptance headings in cascade guard
Dropping the broad acceptance pattern also removes detection of the supported Acceptance heading alias, so isAcceptanceHeading no longer suppresses cascade in sections titled ## Acceptance. Since section parsing still accepts Acceptance (issue_scope_parser.js aliases), this mismatch allows parent-to-child auto-checking inside acceptance criteria for that heading style, which can incorrectly reduce unchecked counts and let completion logic advance too early.
Useful? React with 👍 / 👎.
🤖 Keepalive Loop StatusPR #1653 | Agent: Codex | Iteration 0/5 Current State
🔍 Failure Classification| Error type | infrastructure | |
Keepalive Work Log (click to expand)
|
There was a problem hiding this comment.
Pull request overview
This PR makes two improvements to the keepalive loop logic: it removes an overly broad acceptance heading pattern that could incorrectly disable checkbox cascade in unrelated sections, and it adds proper verification attempt tracking for the 'fix-verification-gaps' flow.
Changes:
- Removes the overly broad
/acceptance/ipattern fromACCEPTANCE_HEADING_PATTERNSto prevent false positives on headings like "User Acceptance Testing" or "Acceptance Rate Analysis" - Extends verification state tracking to include the 'fix-verification-gaps' reason, ensuring attempt counts are properly incremented when the agent is re-run to address verification failures
- Updates test expectations to reflect stricter confidence thresholds for task completion analysis
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
.github/scripts/keepalive_loop.js |
Removes overly broad acceptance pattern; adds verification attempt tracking for fix-verification-gaps reason |
templates/consumer-repo/.github/scripts/keepalive_loop.js |
Identical changes to main keepalive_loop.js (template sync) |
.github/scripts/__tests__/keepalive-loop.test.js |
Updates test assertions to match stricter confidence thresholds (high → medium for lower keyword matches) |
| @@ -860,7 +860,6 @@ function countCheckboxes(markdown) { | |||
| // must be independently verified, not auto-checked by parent cascade. | |||
| const ACCEPTANCE_HEADING_PATTERNS = [ | |||
| /acceptance\s*criteria/i, | |||
There was a problem hiding this comment.
The PR description states that the removed pattern was /^acceptance$/i (with anchors), but the actual removed pattern shown in the diff and code history is /acceptance/i (without anchors). The pattern without anchors is actually MORE overly broad than the description suggests, since it matches "acceptance" anywhere in the heading text, not just as an exact match. While the fix is correct, the PR description should be updated to accurately reflect which pattern was removed.
Summary
/^acceptance$/iheading pattern fromACCEPTANCE_HEADING_PATTERNS/acceptance\s*criteria/i,/definition\s*of\s*done/i,/done\s*criteria/i) are specific enough to correctly identify acceptance criteria sectionsTest plan
https://claude.ai/code/session_01VtzHmRoYTL2kcxaacDgSqQ